Clustering for Data Reduction: A Divide and Conquer Approach

نویسندگان

  • Nicholas O. Andrews
  • Edward A. Fox
چکیده

We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our “divide and conquer” approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering in WSN Based on Minimum Spanning Tree Using Divide and Conquer Approach

Due to heavy energy constraints in WSNs clustering is an efficient way to manage the energy in sensors. There are many methods already proposed in the area of clustering and research is still going on to make clustering more energy efficient. In our paper we are proposing a minimum spanning tree based clustering using divide and conquer approach. The MST based clustering was first proposed in 1...

متن کامل

Knowledge Reduction Based on Divide and Conquer Method in Rough Set Theory

The divide and conquer method is a typical granular computing method using multiple levels of abstraction and granulations. So far, although some achievements based on divided and conquer method in the rough set theory have been acquired, the systematic methods for knowledge reduction based on divide and conquer method are still absent. In this paper, the knowledge reduction approaches based on...

متن کامل

Free Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods

This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...

متن کامل

A Novel K means Clustering Algorithm for Large Datasets Based on Divide and Conquer Technique

In this paper we propose an efficient algorithm that is based on divide and conquers technique for clustering the large datasets. In our research work we have applied divide and conquer technique on partitions of the large datasets and we have used squared Euclidean distance for measuring the similarity between data points. The partitioning of datasets is done according to the number of cluster...

متن کامل

Speaker diarization using divide-and-conquer

Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HAC’s quadratic computational complexity with respect to the number of data samples inevitably limits its application...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007